A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences
نویسندگان
چکیده
Identification and characterization of gene regulatory binding motifs is one of the fundamental tasks toward systematically understanding the molecular mechanisms of transcriptional regulation. Recently, the problem has been abstracted as the challenge planted (l,d)-motif problem. Previous studies have developed numerous methods to solve the problem. But most of them need to specify the length l of a planted motif in advance and use depth first search strategy. In this study, we present an exact and efficient algorithm, called Apriori-Motif, without given the length l of a planted motif a priori. And a breadth first search strategy is used to prune search space quickly by the downward closure property utilized in Apriori, which is a classical algorithm for frequent pattern mining. Empirical study shows that Apriori-Motif is better than some existing methods.
منابع مشابه
Voting algorithms for the motif finding problem.
UNLABELLED Finding motifs in many sequences is an important problem in computational biology, especially in identification of regulatory motifs in DNA sequences. Let c be a motif sequence. Given a set of sequences, each is planted with a mutated version of c at an unknown position, the motif finding problem is to find these planted motifs and the original c. In this paper, we study the VM model...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملAn experimental comparison of two different paradigms in Evolutionary Computation
The DNA motif finding problem is of great relevance in molecular biology. Motifs play an important role in all biological processes since they control the production of certain proteins by turning on and off the genes that codify them. These motifs consist of a short string of unknown length that can be located anywhere throughout the genome. This fact turns the problem much more difficult, so ...
متن کاملAn Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs t...
متن کاملSurvey of Sequential Pattern Mining Algorithms and an Extension to Time Interval Based Mining Algorithm
Sequential pattern mining finds the subsequence and frequent relevant patterns from the given sequences. Sequential pattern mining is used in various domains such as medical treatments, natural disasters, customer shopping sequences, DNA sequences and gene structures. Various sequential pattern mining algorithms such as GSP, SPADE, SPAM and PrefixSpan have been proposed for finding the relevant...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Computational Intelligence Systems
دوره 4 شماره
صفحات -
تاریخ انتشار 2011